Variant Discovery ◾ 137
-Xmx4g ApplyBQSR \
-I ${i}.bam \
-R ${ref} \
--bqsr-recal-file ../BQSR/${i}.table \
-O ../applyBQSR/${i}.bqsr.bam
done
cd ..
4.2.2.2.10 Variant calling
After BQSR of BAM files, we can perform variant calling on each sample using the
“HaplotypeCaller” GATK4 function, which identifies the variation active regions, con-
structs possible haplotypes using de Bruijn-like graph, and then uses Bayesian theory to
call variants as described above. HaplotypeCaller will generate a Genome Variant Call
Format (gVCF) file for each sample. The gVCF stores sequencing information for both
variant and non-variant sites on a genome sequence. It can hold representation of geno-
type, annotation, and other information across all sites in the genome in a compact format.
Storing sample variant in gVCF format will make consolidation of variants across samples
easy.
The following script uses HaplotypeCaller to generate gVCF file for each sample. Notice
that since we are targeting only chromosome 21, we will use “-L chr21” option to restrict
variant calling to that chromosome. Also notice that chromosome label may be different
(e.g., 21, chromosome21); therefore, view the BAM file to check the right chromosome
names. The GATK4 GenotypeGVCFs tool is used to generate gVCFs.
mkdir gvcf
cd applyBQSR
ref=$(ls ../refgenome/*.fasta)
for i in $(ls *.bam|rev|cut -c 5-|rev);
do
~/software/gatk-4.2.3.0/gatk \
--java-options \
-Xmx10g HaplotypeCaller \
-I ${i}.bam \
-R ${ref} \
-L chr21 \
-ERC GVCF \
-O ../gvcf/${i}.g.vcf.gz
done
cd ..
4.2.2.2.11 Consolidating variants across samples
The above script used HaplotypeCaller to generate gVCF file for each sample. The next step
is to use GenomicsDBImport tool to import single-sample gVCFs into GenomicsDB and to
use GenotypeGVCFs tool to consolidate variants across the sample in a single VCF file. For
GenomicsDBImport, the input gVCF file is passed through “-V” option. For multiple gVCF